Visual Programming: Compositional visual reasoning without training
Tanmay Gupta, Aniruddha Kembhavi
CVPR 2023 (Best Paper)
符号主义的胜利!
VISPROG is a modular and interpretable neuro-symbolic system for compositional visual reasoning
Implementation of a VISPROG module.
class VisProgModule():
def __init__(self):
# load a trained model; move to GPU
def html(self,inputs: List,output: Any):
# return an html string visualizing step I/O
def parse(self,step: str):
# parse step and return list of input values
# and variables, and output variable name
def execute(self,step: str,state: Dict):
inputs, input_var_names, output_var_name = \
self.parse(step)
# get values of input variables from state
for var_name in input_var_names:
inputs.append(state[var_name])
# perform computation using the loaded model
output = some_computation(inputs)
# update state
state[output_var_name] = output
# visual summary of the step computation
step_html = self.html(inputs,output)
return output, step_html
Tasks
Modules currently supported in VISPROG.
GQA prompts
Knowledge tagging prompts
...